Adaptive optimal control of entangled qubits

Developing fast, robust, and accurate methods for optimal control of quantum systems comprising interacting particles is one of the most active areas of current science. Although a valuable repository of algorithms is available for numerical applications in quantum control, the high computational cost is somewhat overlooked. Here, we present a fast and accurate optimal control algorithm for systems of interacting qubits, QOALA (quantum optimal control by adaptive low-cost algorithm), which is predicted to offer O(M2) speedup for an M-qubit system, compared to the state-of-the-art exact methods, without compromising overall accuracy of the optimal solution. The method is general and compatible with diverse Hamiltonian structures. The proposed approach uses inexpensive low-accuracy approximations of propagators far from the optimum, adaptively switching to higher accuracy, higher-cost propagators when approaching the optimum. In addition, the utilization of analytical Lie algebraic derivatives that do not require computationally expensive matrix exponential brings even better performance.

A crucial ingredient in these optimal control applications is a numerical method for computing the dynamics of spin systems, which is used for computing the objective function at each iteration of an optimization algorithm. A common feature in most numerical methods is that a uniformly and highly accurate method is used throughout the optimization process when, for the most substantial part, the optimization cannot benefit from the provided accuracy and hence suffers from the computational burden without much gain. This problem becomes more evident and cumbersome, especially in the context of large, multiparticle systems, and therefore more important to address.
Here, we present a general and highly flexible approach for solving optimal control problems for systems of entangled qubits in a computationally efficient manner without sacrificing the desired accuracy. In an optimization process, computationally inexpensive methods for computing spin dynamics can be used far from the optimum, where loss of accuracy is less crucial and where the most iterations in a numerical optimization routine are often spent. Conversely, high-accuracy methods should only be used for computing the dynamics of quantum systems close to the optimum when high fidelity is achievable and desired. This adaptive optimal control method, QOALA (quantum optimal control by adaptive low-cost algorithm), uses a set of approximate propagators that are designed to allow variable degrees of trade-off between accuracy and computational expense and achieves significant speedup in practice. We elucidate the potential speedup of QOALA with the concrete example of a class of numerical methods called propagator splittings. However, the overall framework developed here is flexible enough to incorporate any combination of numerical methods with different cost and accuracy trade-offs. This paper is structured as follows: In the "Formulation of optimal control problem," "Termination criteria," and "The adaptive procedure" sections, a brief and general theory of optimal control is presented, and the concept of adaptive approach is introduced. In the "QOALA with propagator splitting" section, the ingredients of the QOALA method including the Hamiltonian structure, propagator splitting, and computation of derivatives are presented, along with potential benchmarking strategies. Lastly, in the "Numerical demonstrations" section, demonstrations of the proposed method are presented in the context of NMR using a range of examples including state transfers and swap gates on two-, three-, and four-spin systems. In each case, the convergence and wall-clock time of the method are compared to methods (28)(29)(30) available via the versatile software package Spinach (31). It is demonstrated that the proposed adaptive framework consistently outperforms available methods. Relative speedup of QOALA compared to an exact method is shown in Fig. 1.

Formulation of optimal control problem
The state of a system with M spins at time t is described by a density matrix ρ(t), and its dynamics are governed by the Liouville-von Neumann equation where ρ 0 is the initial state of the system at time t 0 , and Hamiltonian superoperator of the system. The solution of Eq. 1 can be expressed as Formally, the propagator U(t) is a time-ordered exponential of À iH denoted as In optimal control applications, the Hamiltonian, HðtÞ, depends on some control parameters, θ, and optimal choices of these parameters are sought, which can either (i) drive the state of the system ρ(T ) at a specified time T to a desirable target ϱ or (ii) ensure that the propagator U(T ) implements a desirable propagator U target . To highlight the dependence on θ, we often write H(t; θ), U(t; θ), and ρ(t; θ) instead of H(t), U(t), and ρ(t), respectively, where normally, t ∈ [0, T ]. Further, we assume that H(t; θ) and U(t; θ) are represented by N × N matrices.
The optimal control problem is expressed in terms of maximizing an objective function where f is a continuously differentiable real-valued functional of propagator matrices, i.e., Here, we are particularly concerned with cases where the value of the objective at the optimal parameters is known a priori. Without loss of generality, we assume that, after suitable scaling of f, the objective F is able to achieve the maximum possible value of 1 A large class of objective functions F of this form that appear in the context of optimal control of spins and gate design are fidelity functions. For instance leads to the fidelity functional which measures the overlap of the quantum state ρ(T; θ) at a specified time t = T, with a target state ϱ. Here, both ρ 0 and ϱ are assumed to be improper [i.e., trace-free, Tr(ρ) = 0], normalized (‖ρ‖ 2 = 1), and Hermitian (ρ † = ρ) density matrices. Functionals such as Eq. 8, where an initial state ρ 0 appears, lead to fidelity functions that are used in the context of state-to-state transfers, i.e., case (i).
Here, normalization is only assumed so that value at the optima is 1, i.e., to satisfy Eq. 9. Note that the requirement of trace-free density matrices only applies to the fidelity functional described in Eq. 9, and the overall approach applies equally to fidelity functions of proper density matrices (i.e., non-trace-free density matrices).
The functional leads to the fidelity which does not depend on initial state ρ 0 and can be used where an effective design of a specified propagator U target (a desired unitary gate, for instance) is required, i.e., case (ii). Note that, instead of working with the real value of the trace in Eqs. 8 and 10, i.e., applying the function Re( · ), we can also apply any other continuously differentiable function, e.g., absolute value squared, | · | 2 . Fidelity functions measure the similarity between mixed quantum states. A very wide range of fidelity functions exist, among them notably the Uhlmann-Jozsa fidelity (32), which satisfy the fidelity axioms to different extents. We refer the reader to (33) for a more detailed discussion of the appropriateness of various fidelity measures. For the purposes of this manuscript, however, all objective functions F satisfying Eqs. 4 to 7 will be considered valid fidelity functions that can be maximized using the proposed approach.
The fidelity functions can be maximized using gradient-based optimization schemes, where one needs the gradient of the fidelity function F . We compute the gradient using the chain rule @F ðuÞ @u ¼ Df ½UðT; uÞ� @UðT; uÞ @u ð12Þ which further requires computation of the gradient of the propagator U(T; θ). Here, Df(X ) is the Fréchet derivative of f at X, which acts linearly on Y, and Df (X )Y quantifies the rate of change in the direction Y. Equation 12 can be generalized further to use Gateaux derivatives (or directional derivatives) (34,35), where instead of Df(X )Y, we write Df(X; Y ) or D Y f(X ). For instance, the action of the Fréchet derivative of the functional in Eq. 8 is Df ðXÞY ¼ Re½Trð@ y Yr 0 Þ� so that the gradient of the fidelity Eq. 9 is @F ðuÞ @u ¼ Re Tr @ y @UðT; uÞ @u r 0 The exact propagator U(T; θ) (and hence its derivatives) are not available, since an analytical solution of Eq. 3 is not available except in very specialized and restrictive circumstances. Instead, in practice, we rely on numerical methods for solving Eq. 3.
We assume that a family of numerical solvers S (1) , S (2) , …, S (L) to approximate the propagator U(T; θ) is available and arranged in increasing accuracy and cost, i.e., we assume that the solver S ð'þ1Þ is more accurate and computationally more expensive than S ð'Þ . Here, S ð'Þ is a (θ-dependent) linear map from the initial state ρ 0 to the state at time T r ð'Þ ðT; uÞ ¼ S ð'Þ ðuÞr 0 and it produces an approximation r ð'Þ (T; θ) to ρ(T; θ), the true solution of Eq. 1.
The use of a particular propagator S ð'Þ naturally leads to the computation of an approximate value of the fidelity function When a range of numerical solvers with different cost and accuracy trade-offs are available, a natural question to ask is: What is the best choice of solver? On the one hand, the most inexpensive solver S (1) makes the optimization fast. However, it also leads to a less accurate approximation to the fidelity, F (1) (θ). This becomes particularly important in applications where high fidelities (approaching 1) are feasible and desired, and using low-accuracy approximations limits the fidelity achievable. In such cases, the most accurate solver S (L) seems appealing. However, such a solver typically comes with a high computational cost. The use of such an accurate but costly solver seems less justified in the initial stages of optimization where the objective is very far from the optimal value.
Here, we propose an adaptive procedure where inexpensive solvers with low accuracy are used far from the optimum, and high accuracy solvers with large computational costs are used only closer to the optimum θ * . The framework requires the following ingredients: (i) a set of numerical solvers S (1) , S (2) , …, S (L) with different cost and accuracy trade-offs, (ii) a method for computing gradients of these solvers with respect to θ for approximation of the fidelity gradient Eq. 12, and (iii) a low-cost, adaptive strategy to switch from a solver S ð'Þ to a more accurate solver S ð'þ1Þ , which includes an error estimation of the computed fidelity F ð'Þ and a measure of proximity to the optimum θ * .

Termination criteria
In theory, a nonlinear optimization process is terminated when the distance of the jth iterate θ ( j ) from the optimum θ * becomes sufficiently small. In practice, however, because θ * is not available a priori, we estimate‖θ * − θ (j−1) ‖ by ‖θ (j) − θ ( j−1) ‖ and terminate iterations when for some user-defined tolerance tol θ . Another criterion is motivated by the fact that, at critical points (including, but not limited to, the optimum), the gradient vanishes, e.g., ∇ θ F (θ * ) = 0. Again, some user-defined tolerance tol g is used to asses this criterion as The last criterion to consider is the difference of the value of the objective F at the previous iterate θ (j − 1) from the optimal value F ðu � Þ However, we know that when a system is fully controllable, given a set of parameters, a fidelity of 1 can be achieved, i.e., F (θ * ) = 1, and therefore, for some user-defined tolerance tol F can be used alongside Eqs. 14 and 15 as a termination criterion.
Note that even if we do not know whether a fidelity of 1 is achievable in a particular application, in quantum optimal control, we are guaranteed that F ≤ 1 (36) for suitably normalized initial and target states. Thus, Eq. 17 does not lead to unnecessarily early termination and should be used alongside Eqs. 14 and 15.
The adaptive procedure An additional and important aspect in an adaptive process is the numerical error in the approximation of the objective F . We can, at best, rely on F (L) , generated using the most accurate numerical solver S (L) available to us, but we would like to use the least expensive solvers S (1) , wherever possible. Here, we consider the true fidelity F [θ ( j) ], a fidelity measure that would results from exact calculation methods, and F ð'Þ [θ (j) ], a fidelity when the numerical solver S ð'Þ is being used in the jth iteration. For ease of notation, we have suppressed the dependence on the parameters θ in this section. Using Eqs. 16 and 17, we can construct a general system of inequalities for the adaptive procedure where κ F ∈ (0,1) is a user-defined parameter. This system of inequalities enforces the termination criteria such that and ensures that The true fidelity F also satisfies the termination criterion Eq. 17. Note that κ F in Eq. 20 guarantees that at all times, during the optimization procedure, we keep the numerical error several times smaller than |1 − F ð'Þ , which measures the distance from optimal value [assuming F (θ * ) = 1].
As we approach the optimal value, we need to use more accurate solvers. When Eq. 20 is violated, the optimizer switches to a more accurate solver S ð'þ1Þ . From the left side of Eq. 20, the error of the fidelity approximation F ð'Þ using the solver S ð'Þ should be computed Because the true fidelity F is not available, we use a more accurate scheme for approximating F The most inexpensive estimate is given by The computation of 1 ð';'þ1Þ as in Eq. 21 involves both S ð'Þ and S ð'þ1Þ and therefore is more than twice as expensive as using S ð'Þ alone. A simple way to mitigate this problem is to only make the decision to switch from S ð'Þ to S ð'þ1Þ once every 5 to 10 iterations, which reduces the overall burden of computing 1 ð';'þ1Þ . However, the gap between tests should not be too large to avoid the fidelity deviating from the true fidelity. Although other methods like defect-based error estimators (37) can provide a highly accurate estimate of the error in S ð'1Þ without the use of a more accurate scheme S ð'2Þ ð' 2 . ' 1 Þ and can be easily incorporated in our framework, as our current application does not require a very high accuracy of error estimation, we restrict the present implementation to the estimator presented in Eq. 21.

QOALA with propagator splitting
We start with presenting the total Hamiltonian in terms of interaction (H in ) and single-spin (H ss ) sub-Hamiltonians where δt is a small change in t, derived from the differential in Eq. 3. The Hamiltonian for an isolated (noninteracting) spin can be written as Any other parts of the Hamiltonian, e.g., interactions, can also be included in H in . Thanks to utilization of superoperator formalism Table 1. Summary of Trotterized splitting methods as solvers to approximate e AþB . A pictorial representation of S 2,3 is shown in Fig. 2. The computational cost, O cost , is of a forward time propagation to obtain U(T; θ) for a fidelity calculation in Eq. 4. and These decomposition and separation play a key role in the mechanism of the proposed method. In principle, one can use any set of solvers for the approximation of the propagator, ordered in ascending accuracy and cost. In this work, we restrict our attention to a range of splitting methods, which are numerical solvers that approximate e A þ B in terms of products of e A and e B . The idea of approximating a propagator using splitting techniques is well explored in the literature (38)(39)(40)(41)(42)(43)(44). Propagator splittings of arbitrarily highorder accuracy can be derived using techniques such as Baker-Campbell-Hausdorff formula (45), Zassenhaus splitting (46), Magnus expansion (47), and their combinations. In what follows, a specified solver using a splitting method of order p, having an accuracy O(δt p ), will be denoted with a subscript S p .
Moreover, any splitting method S p can be combined with Trotterization (38), which divides a small time slice of duration δt into q equal, smaller, time slices of durations dt q for improved numerical accuracy, to generate a larger group of solvers where the Trotter number q is now also included as a subscript in the solver, S p,q . Splitting methods used as solvers in the present work are summarized in Table 1, with a schematic diagram outlining second-order splitting, p = 2, with Trotter number q = 3 shown in Fig. 2.
The accuracy of each solver is known, and since the dimensions of the Hamiltonian do not change in the course of the optimization, we can consider the number of matrix-vector multiplications as a measure for cost. Using these known attributed accuracy and cost, we can obtain a generic graph of cost versus accuracy for our set of solvers (Fig. 3).
Let us recall the definition of fidelity in Eq. 4 which is the objective function to be minimized in the optimization. Although the scope of the QOALA method is not limited to the GRAPE algorithm (22), here, it is presented in the context of the GRAPE method, where the time-dependent control pulses are generally presented as piecewise constant over a small time interval δt u a;k ðtÞ ! ½u a;k;1 u a;k;2 � � � u a;k;N � with a ∈ {x, y} controlling the kth spin. For notational convenience, the explicit time interval dependence of θ a,k,n (δt) has been dropped. In this context, the effective propagator over the total pulsing time duration, T = Nδt, is given by where θ n is a general control parameter solely affecting the nth propagator, U n , and the shorthand θ n = θ a,k,n is valid for any a ∈ {x, y} and k ∈ {1, …, M}. Note that for pulses that are not piecewise constant, the decomposition Eq. 28 does not hold exactly. However, similar decompositions can be obtained for arbitrary pulses using the Magnus expansion (47), for instance. The gradient of the true fidelity, i.e., the derivative of the fidelity at every time point n with respect to a discrete control, θ n , can be written as However, because of computational costs, we never compute the true fidelity Eq. 4 or its gradient Eq. 29. The central idea presented here relies on the approximation of U n ¼ e À iðdtÞHðu n Þ using a solver S ð'Þ,n and of the propagator U(T; θ) = U tot by S (l) Thus, instead of Eq. 4, we compute the approximate fidelity and the exact gradient of the approximate fidelity Eq. 32 where @S ð'Þ ðuÞ @u n ¼ S ð'Þ;N � � � S ð'Þ;nþ1 @S ð'Þ;n @u n S ð'Þ;nÀ 1 � � � S ð'Þ;1 ð34Þ As a concrete example of approximate solvers, S ð'Þ , we consider the use of the splitting propagators introduced in Table 1. The nth propagator U n is approximated by a split propagator, S p,q (θ n ) = [S p (θ n )] q . For ease of notation, we drop the dependency on θ n . Assuming an odd number of multiplications, 2P + 1, in a general solver, S p = S 2P+1 S 2P ⋯S 2 S 1 (making a distinction between the solver, S p , and one of its constituent matrix exponentials, S), and by considering that A and B appear in odd and even S terms, respectively (without loss of generality, as presented in Table 1), i.e., and given the fact that @A @u n ¼ 0, @B @u n = 0, and @B @u n ; B � � = 0, for a general solver, S p,q , with Trotter number q, we have where, using S p = S 2P+1 S 2P ⋯S 2 S 1 @S p @u n ¼ @S 2j @u n can be computed using finite difference (48) or exact (30) methods. While finite difference approximations of the gradient are easy to program, they can also be inaccurate, expensive, and most importantly, in the present context, unstable when the underlying numerical propagator has low accuracy (49,50). Here, taking advantage of the decomposition presented in Eq. 26, we use analytical Lie algebraic derivatives using the recent ESCALADE (efficient spin control using analytical Lie algebraic derivatives) method (49), which, in contrast to the auxiliary matrix method (29,30,(51)(52)(53), does not require the computation of an expensive matrix exponential, resulting in an additional significant speedup.

Numerical demonstrations
The state transfer problem is investigated for two different spin systems: a two-spin system, labeled system 1 in Table 2 and inset in Fig. 4D, and a three-spin system, labeled system 2 in Table 2 and inset in Fig. 4E. The two-spin and three-spin systems are both heteronuclear with 2M controls (x and y controls on each spin) and pulse B 1 amplitude limited to 1 kHz. Heteronuclear two-spin and three-spin systems are tasked with transferring an initial z magnetization from the first spin to a desired z magnetization at the final spin. The two-spin system is optimized over a pulse duration of T = 10 ms, and the three-spin system is isoptimized over T = 22 ms, both systems are allowed a time slice of δt = 0.2 ms.
The optimal control task of a four-spin system is purposefully made more difficult: The initial state is set as the entangled multispin state defined in Eq. 45; a relaxation term is included in the uncontrollable part of the Hamiltonian. The relaxation term is calculated using Spinach (29,31,54,55) Bloch-Redfield-Wangsness relaxation theory with isotropic rotational diffusion (correlation time τ c = 50 ps). The system, labeled system 3 in Table 2 and inset in Fig. 4F, is a mixture of heteronuclear and homonuclear systems and is controlled with four controls (x and y controls on each type of spin), limited to 10-kHz pulses, and optimized over a pulse duration of T = 100 ms with a time slice width of δt = 0.02 ms.
The convergence of the two-spin system follows a typical quadratic convergence, which can be seen from the average convergence of the exact method (dashed black lines) and in Fig. 4A. The interquartile range is also shown (solid gray area bounded by thin dashed lines), with small deviations from the average indicating a predictable/robust optimization. This can be considered an easy optimal control problem. However, at the time slice width of δt = 0.2 ms, the second-order splitting solver S 2,1 fails to reach a fidelity above 99%. A similar behavior is observed for the threespin and four-spin problems, when using the S 2,1 solver, but with a fidelity limited to approximately 90%. The difficulty of these two optimal control problems is evident from the less than quadratic convergence of the exact method, which, after the initial damped stage of convergence, has a lower acceleration of convergence when compared to Fig. 4A.
Higher-order solvers, S 3,1 and S 4,1 , increase the limit on achievable fidelity for all spin systems in Fig. 4 with an additional increase in computation time. Taking advantage of the short computation time of low-accuracy solvers when allowable, the QOALA algorithm switches to higher-accuracy solvers only when needed. Given a switching threshold tolerance of κ F = 0.5 in Eq. 20 and a switching decision every 10 iterations, the convergence of QOALA in Fig. 4 does converge to the high fidelities of the high-accuracy solver S 4,1 while benefiting from the time savings of the low-accuracy solvers at the beginning of the optimization. Furthermore, the average convergence trajectory of the two-spin system in Fig. 4A   Fig. 2. Pictorial schematics of time propagator splitting and Trotterization. A schematic diagram of a coupled two-qubit system, an angle-axis representation of ESCALADE (efficient spin control using analytical Lie algebraic derivatives) (49), a pictorial representation of second-order operator splitting, and Trotterization of time slices with q = 3. This corresponds to S 2,3 in Table 1. The final image shows a Trotterized, split operator, time propagation with the leftmost and rightmost split coupling of the operator splitting method, allowable for S 2,q , S 3,q , and S 4,q in Table 1, combined into one exponential.  Table 1; arrows and S p,q terms represent switching points between members of the solver set. follows the average convergence trajectory of the exact method very closely. The first and third quartiles are also closely aligned, showing both the exact and adaptive methods converge in a predictably quadratic fashion from different, random, initial guesses. From Fig. 4 (B and C), the switch to S 3,1 looks to have happened a little too late, following S 2,1 for too many iterations and losing a small amount of convergence acceleration in the process. Even considering this small loss in potential time saving, all spin systems achieve an appreciable time saving compared to the exact method. Emphasizing this point further, for a four-spin system, depicted in Fig. 4F, the exact method takes more than 1.5 hours to achieve a fidelity of 99.99%, whereas QOALA takes less than 7 min to achieve the same fidelity. The universal swap gate problem, with the fidelity defined in Eq. 46, is investigated for two different heteronuclear spin systems: a two-spin system, labeled system 1 in Table 2 and inset in Fig. 5D. The potential time saving should be more for this type of optimization, because the efficient Krylov propagation of the exact method in state transfer problems is not appropriate here, and an explicit matrix exponential must be made in gradient calculations. In this case, the exponential of a matrix is evaluated using a Taylor series, with scaling and squaring (35), and truncated with a predefined tolerance of 1 ×10 −12 . Both problems have 2M controls (x and y controls on each spin), with pulse B 1 amplitudes limited to 1 kHz. The adaptivity of the QOALA is set with a switching threshold tolerance of κ F = 0.5 (Eq. 20), a switching decision every 10 iterations. In the context of our set of examples, we observed that universal swap gate problems require slightly longer pulses than the state transfer problems. The two-and three-spin systems are optimized with T = 12 and 26.4 ms, respectively. An example of the effect of a three-spin swap gate is shown in Fig. 6, showing the first and third spin states swapped, and the state of the second spin is left unchanged.
In comparison to the state transfer problems in Fig. 4, the inclusion of Trotterization (S 2,2 ) in Fig. 5 makes the convergence trajectory more stable by avoiding using the solver S 2,1 for too many iterations, following the average convergence trajectory of the exact method closely in the case of the three-spin system in Fig. 5B and even finding a shortcut to convergence for the twospin system in Fig. 5A. Again, both methods converge in a predictable fashion from initial random guesses, shown by the first and third quartiles. Figure 7 shows the speedup achieved with the QOALA method for these two universal swap gate optimizations. It shows an even greater (an additional twofold) speedup than for state transfer problems in Fig. 1. This is because the efficiency of Krylov propagation cannot be used for propagator derivative calculations in the exact method. The slowdown from using matrix-matrix products, as opposed to efficient matrix-vector products of the state transfer problems, is inherited by both QOALA and the exact method.

Performance
The QOALA method starts with low-cost, low-accuracy approximations of propagators far from the optimum and adaptively switches to high-cost, high-accuracy approximations as it approaches the optimum. With a set of approximant propagators, the QOALA method offers a versatile trade-off mechanism between accuracy and computational cost.
Note that the speedup achieved with the QOALA approach compared to the exact method depends on the type of optimal control problems and the system under study, but we predict that an ∼O(M 2 ) speedup is expected for a system consisting of M qubits while maintaining the accuracy of the optimal solution, as summarized in Fig. 1. High-order splitting methods are required at the highest fidelities, and the computation time of these high-order splitting methods dominate the overall computation time. The speedup tends to be O(M ) at these high fidelities because of this domination and because there is a splitting order/Trotter number pair that gives the same accuracy as the exact method with a matrix exponential.

Benchmarking
An alternative approach to error estimation presented in "The adaptive procedure" section is benchmarking of the solvers before the optimization. For any solver in the set of solvers, the cost and achievable accuracy is known. Precomputed benchmarking consists Table 2. Spin system parameters for four different spin systems, labeled system 1 to system 3 in the first column. The second column refers to a numeric label, k, for each spin in the system, with the isotope of that spin in the third column. The resonance offset (chemical shift), Ω k , is shown for each spin in hertz in the fourth column. Coupling, J jk , between different spins, labeled j and k, is shown as a list in the fifth column in hertz. The static magnetic field, B 0 , is shown in the sixth column, and the nominal power level of the control pulses, B 1 , is shown in kilohertz in the seventh column. The final column shows the number of control channels. Simulations use a penalty function to ensure that control pulses do not exceed to nominal power level, ±B 1 , of the control pulses. of identifying switch points on a cost versus accuracy plot that minimizes the overall cost while maximizing the accuracy. If we want to apply this benchmarking strategy to the set of solvers in Table 1, then it can be seen that, with efficient matrix caching, the minimum cost of the computation of S p = S 2P+1 S 2P ⋯S 2 S 1 and its derivative in the presence of qth order Trotterization is (4 + K )Pq + 2, where K is the number of controls. For example, the plot of Fig. 3 shows that there is no benefit in using second-and third-order solvers with Trotter numbers larger than 2 and 1, respectively.

Concluding remarks
We presented QOALA, a fast, accurate, general, and highly flexible optimal control algorithm, for solving optimal control problems involving systems of entangled qubits in a computationally efficient manner without sacrificing the desired accuracy of the optimal solution. The results of this manuscript can be generalized in a straightforward manner to functionals that involve the values of the density matrix ρ at multiple points, as well as functionals that involve regularization and penalty terms.

Liouville space
In this work, we use the spherical tensor basis (56) in a Liouville space, allowing a general implementation to include effects such as relaxation or decoherence. Using the Liouville-Hilbert dual relationship The method is formulated in superoperator formalism

Hamiltonians
Although the theoretical basis of this work can be extended to any quantum systems, it is formulated and demonstrated in the context of NMR, for a spin system consisting of M spin − 1 2 particles interacting with each other via scalar coupling J and under a radiofrequency pulse. The noninteracting, single-spin part of the Hamiltonian can be written as Therefore, we can prepare this Hamiltonian for our adaptive approach by where θ x and θ y are the x and y components of the time-dependent pulse amplitudes. In the case of heteronuclear coupling, the spin system can be considered weakly coupled, reducing the interaction Hamiltonian to Auxiliary matrix method One method to find the time propagators and analytical propagator derivatives is to use an auxiliary matrix method (29,30,(51)(52)(53) for computing the matrix exponential of a block triangular matrix where C a,k = − iδtI a,k for a control pulse amplitude θ a,k,n of duration δt. The propagators can be recovered from one of the diagonal blocks, and the propagator derivatives with respect to θ a,k,n , in the direction of the control operator I a,k (Eq. 40), can be extracted from the upper right block. For state transfer problems, this method should also use efficient Krylov propagation, avoiding the need to explicitly compute the matrix exponential (57)(58)(59).

Optimal control problems
Three types of optimal control problem are compared in this work. z to z transfer The goal is to transfer an initial z magnetization on a spin j to z magnetization on a spin k: I z,j → I z,k . Magnetization on each spin can be depicted in a simple way through Cartesian coordinates on each spin's Bloch sphere, as in Fig. 2. Here, we use the state-to-state transfer fidelity Eq. 9, F = Re[Tr(ϱ † U tot ρ 0 )].

Entangled to z state transfer
An initial two-spin entangled state between two spins j and k asks the optimization to collapse the entanglement to a desired localized z magnetization on a further spin l: E (j,k) → I z,l . The normalized, entangled, two-spin state between spins j and k is defined as where 1 is a unit state, a trivial but complete addition to the basis set, and I ± = I x ± iI y . Once again, the fidelity used is Eq. 9, F = Re [Tr(ϱ † U tot ρ 0 )].

Universal swap gate
A universal multispin operation designed to swap the arbitrary initial magnetization between two spins, j and k, is termed as a swap gate: I (j) ⇌ I (k) . The swap gate can be found by optimizing to a desired effective propagator, with a fidelity defined through only the effective propagator of the target where U swap is the effective propagator of the swap operation (60). This fidelity function is of the form Eq. 11, and the overlap is now Comparison of the convergence of the two-spin and three-spin swap-gate with solvers S 2,1 , S 2,2 , and S 3,1 (solid colored) and QOALA using those same solvers adaptively (solid black) to an exact method (black dashed) (A and B) and the corresponding wall-clock time of computation (C and D). The two-spin system in (A) and (C) is given a pulse duration of T = 12 ms, and the three-spin system, swapping the first and third spin, in (B) and (D) is given a pulse duration of T = 26.4 ms. Both have a time slice width of δt = 0.1 ms. In all plots, a central thick line is an average over 28 optimizations, and the surrounding shaded area is bound by the first and third quartiles. Adaptive splitting order and Trotter number selection (solid black) allow the optimization to follow an inexpensive route through a convergence trajectory: using approximate methods far from the optimum and accurate methods close to the optimum.
defined in a more general way by the Hilbert-Schmidt inner product. The real part disregards a trivial global phase that does not affect the swap gate operation. A number of different spin systems are set out in Table 2, which are used to investigate convergence and benchmark wall-clock time of QOALA in comparison to exact calculations of propagators and propagator derivatives using the auxiliary matrix method (29) of Eq. 44. As mentioned above, state-to-state simulations for the exact method were performed using the efficient Spinach implementation (31) of Krylov propagation within MATLAB.

Optimization strategy
All results were obtained using a workstation running CentOS 7 with a 14-core Intel Xeon W-2175 CPU at 2.50 GHz and 64 GB of RAM. All optimizations were run 28 times, from 28 different random initial pulse guesses, with their convergence trajectories and time of computation (the wall-clock time), and averaged to show typical convergence and computation behavior. The l-BFGS (limited-memory Broyden-Fletcher-Goldfarb-Shanno) gradient following optimization method is used in all cases, with the 25 most recent gradients forming a Hessian approximation (28). A bracketing and sectioning line search method (with cubic interpolation) calculates an appropriate step length using strong Wolfe conditions (50). When an adaptive method switches between solvers, a step length of 1 is selected, with no further line search at that iteration. Termination conditions of optimization are tol θ = 1 × 10 −12 in Eq. 14 and tol g = 1 × 10 −12 in Eq. 15. All optimal control problems are given sufficient pulse duration to achieve high fidelity (18) and enforce a pulse amplitude spill-out penalty beyond 1 kHz (or 10 kHz for four-spin examples) (30), and fidelities are normalized so the globally optimal fidelity is unity. Essentially, there is the requirement that there are enough controls (enough time slices) to control the system to its desired target. This is a requirement for both an exact method and the solver-based methods, S p,q , presented here. Accordingly, the control problems are designed to have enough time slices to attain good convergence, but no more.